Methods and apparatus for space efficient adaptive detection of multidimensional hierarchical heavy hitters

ABSTRACT

The present invention develops an efficient streaming method for detecting multidimensional hierarchical heavy hitters from massive data streams and enables near real time detection of anomaly behavior in networks.

This application is a continuation of U.S. patent application Ser. No.11/042,714, filed Jan. 24, 2005 (currently allowed) which claims thebenefit of U.S. Provisional Application No. 60/538,720 filed on Jan. 23,2004. All of the above-cited applications are herein incorporated byreference in their entirety.

The present invention relates generally to traffic monitoring and, moreparticularly, to a method and apparatus for identifying multidimensionalhierarchical heavy hitters for monitoring one or more networks, e.g.,packet switched communication networks such as VoIP networks.

BACKGROUND OF THE INVENTION

The Internet has emerged as a critical communication infrastructure,carrying traffic for a wide range of important scientific, business andconsumer applications. Network service providers and enterprise networkoperators need the ability to detect anomalous events in the network,for network management and monitoring, reliability, security andperformance reasons. While some traffic anomalies are relatively benignand tolerable, others can be symptomatic of potentially serious problemssuch as performance bottlenecks due to flash crowds, network elementfailures, malicious activities such as denial of service attacks (DoS),and worm propagation. It is therefore very important to be able todetect traffic anomalies accurately and in near real-time, to enabletimely initiation of appropriate mitigation steps.

One of the main challenges of detecting anomalies is the mere volume oftraffic and measured statistics. Given today's traffic volume and linkspeeds, the input data stream can easily contain millions or more ofconcurrent flows, so it is often impossible or too expensive to maintainper-flow state. The diversity of network types further compounds theproblem. Thus, it is infeasible to keep track of all the trafficcomponents and inspect each packet individually for anomaly behavior.

Another major challenge for anomaly detection is that traffic anomaliesoften have very complicated structures: they are often hierarchical(i.e. they may occur at arbitrary aggregation levels like ranges of IPaddresses and port numbers) and multidimensional (i.e. they can only beexposed when we examine traffic with specific combinations of IP addressranges, port numbers, and protocol). In order to identify suchmultidimensional hierarchical traffic anomalies, a naive approach wouldrequire examining all possible combinations of aggregates, which can beprohibitive even for just two dimensions. Another important challengestems from the fact that existing change detection methods utilize usagemeasurements that are increasingly sampled.

Therefore, a need exists for a method and apparatus for near real-timedetection of multidimensional hierarchical heavy hitters inpacket-switched networks, (e.g., Voice over Internet Protocol(VoIP)networks), that can also accommodate sampling variability.

SUMMARY OF THE INVENTION

In one embodiment, the present invention discloses an efficientstreaming method and apparatus for detecting multidimensionalhierarchical heavy hitters from massive data streams with a large numberof flows. The data structure is adaptive to the offered traffic andcarries a synopsis of the traffic in the form of a set of estimatedhierarchical aggregates of traffic activity. The structure is adapted inthat each aggregate contains no more than a given proportion of thetotal activity unless the aggregates are not further divisible.

This method has much lower worst-case update cost than existing methods,and provides deterministic accuracy that is independent of the offereddata. In one embodiment, the invention provides a method for adjustingthe threshold proportion for detection. Therefore, the level of reporteddetail can be traded off against the computational time. The inventionalso accommodates the inherent sampling variability within thepredictive method.

BRIEF DESCRIPTION OF THE DRAWINGS

The teaching of the present invention can be readily understood byconsidering the following detailed description in conjunction with theaccompanying drawings, in which:

FIG. 1 illustrates an exemplary network related to the presentinvention;

FIG. 2 illustrates an example of a trie at the arrival of a packet;

FIG. 3 illustrates the trie of FIG. 2 after update for the packet;

FIG. 4 illustrates an example of a grid-of-trie data structure at thetime of a packet arrival;

FIG. 5 illustrates the grid-of-trie of FIG. 4 after the updateoperation;

FIG. 6 illustrates the rectangular search structure before update;

FIG. 7 illustrates the rectangular search structure after update;

FIG. 8 illustrates the movement for the rectangular search operation;

FIG. 9 illustrates a flowchart of a method for detecting amulti-dimensional hierarchical heavy hitter; and

FIG. 10 illustrates a high level block diagram of a general purposecomputer suitable for use in performing the functions described herein.

To facilitate understanding, identical reference numerals have beenused, where possible, to designate identical elements that are common tothe figures.

DETAILED DESCRIPTION

The present invention broadly discloses a method and apparatus fordetecting hierarchical heavy hitters. Although the present invention isdiscussed below in the context of detecting traffic anomalies in anetwork, the present invention is not so limited. Namely, the presentinvention can be applied in the context of datamining, trending,forecasting, outlier detection and the like. Furthermore, although thepresent invention is discussed below in the context of packets, thepresent invention is not so limited. Namely, the present invention canbe applied in the context of records, fields, or any other unit ormeasure of data. For the purpose of scope, the term packet is intendedto broadly include a record or a field.

To better understand the present invention, FIG. 1 illustrates anexample network, e.g., a packet-switched network such as a VoIP networkrelated to the present invention. The VoIP network may comprise varioustypes of customer endpoint devices connected via various types of accessnetworks to a carrier (a service provider) VoIP core infrastructure overan Internet Protocol (IP) based core backbone network. Broadly defined,a VoIP network is a network that is capable of carrying voice signals aspacketized data over an IP network. An IP network is broadly defined asa network that uses Internet Protocol to exchange data packets.

The customer endpoint devices can be either Time Division Multiplexing(TDM) based or IP based. TDM based customer endpoint devices 122, 123,134, and 135 typically comprise of TDM phones or Private Branch Exchange(PBX). IP based customer endpoint devices 144 and 145 typically compriseIP phones or PBX. The Terminal Adaptors (TA) 132 and 133 are used toprovide necessary interworking functions between TDM customer endpointdevices, such as analog phones, and packet based access networktechnologies, such as Digital Subscriber Loop (DSL) or Cable broadbandaccess networks. TDM based customer endpoint devices access VoIPservices by using either a Public Switched Telephone Network (PSTN) 120,121 or a broadband access network via a TA 132 or 133. IP based customerendpoint devices access VoIP services by using a Local Area Network(LAN) 140 and 141 with a VoIP gateway or router 142 and 143,respectively.

The access networks can be either TDM or packet based. A TDM PSTN 120 or121 is used to support TDM customer endpoint devices connected viatraditional phone lines. A packet based access network, such as FrameRelay, ATM, Ethernet or IP, is used to support IP based customerendpoint devices via a customer LAN, e.g., 140 with a VoIP gateway androuter 142. A packet based access network 130 or 131, such as DSL orCable, when used together with a TA 132 or 133, is used to support TDMbased customer endpoint devices.

The core VoIP infrastructure comprises of several key VoIP components,such the Border Element (BE) 112 and 113, the Call Control Element (CCE)111, and VoIP related servers 114. The BE resides at the edge of theVoIP core infrastructure and interfaces with customers endpoints overvarious types of access networks. A BE is typically implemented as aMedia Gateway and performs signaling, media control, security, and calladmission control and related functions. The CCE resides within the VoIPinfrastructure is connected to the BEs using the Session InitiationProtocol (SIP) over the underlying IP based core backbone network 110.The CCE is typically implemented as a Media Gateway Controller andperforms network wide call control related functions as well asinteracts with the appropriate VoIP service related servers whennecessary. The CCE functions as a SIP back-to-back user agent and is asignaling endpoint for all call legs between all BEs and the CCE. TheCCE may need to interact with various VoIP related servers in order tocomplete a call that require certain service specific features, e.g.translation of an E.164 voice network address into an IP address.

For calls that originate or terminate in a different carrier, they canbe handled through the PSTN 120 and 121 or the Partner IP Carrier 160interconnections. For originating or terminating TDM calls, they can behandled via existing PSTN interconnections to the other carrier. Fororiginating or terminating VoIP calls, they can be handled via thePartner IP carrier interface 160 to the other carrier.

In order to illustrate how the different components operate to support aVoIP call, the following call scenario is used to illustrate how a VoIPcall is setup between two customer endpoints. A customer using IP device144 at location A places a call to another customer at location Z usingTDM device 135. During the call setup, a setup signaling message is sentfrom IP device 144, through the LAN 140, the VoIP Gateway/Router 142,and the associated packet based access network, to BE 112. BE 112 willthen send a setup signaling message, such as a SIP-INVITE message if SIPis used, to CCE 111. CCE 111 looks at the called party information andqueries the necessary VoIP service related server 114 to obtain theinformation to complete this call. If BE 113 needs to be involved incompleting the call; CCE 111 sends another call setup message, such as aSIP-INVITE message if SIP is used, to BE 113. Upon receiving the callsetup message, BE 113 forwards the call setup message, via broadbandnetwork 131, to TA 133. TA 133 then identifies the appropriate TDMdevice 135 and rings that device. Once the call is accepted at locationZ by the called party, a call acknowledgement signaling message, such asa SIP-ACK message if SIP is used, is sent in the reverse direction backto the CCE 111. After the CCE 111 receives the call acknowledgementmessage, it will then send a call acknowledgement signaling message,such as a SIP-ACK message if SIP is used, toward the calling party. Inaddition, the CCE 111 also provides the necessary information of thecall to both BE 112 and BE 113 so that the call data exchange canproceed directly between BE 112 and BE 113. The call signaling path 150and the call data path 151 are illustratively shown in FIG. 1. Note thatthe call signaling path and the call data path are different becauseonce a call has been setup up between two endpoints, the CCE 111 doesnot need to be in the data path for actual direct data exchange.

Note that a customer in location A using any endpoint device type withits associated access network type can communicate with another customerin location Z using any endpoint device type with its associated networktype as well. For instance, a customer at location A using IP customerendpoint device 144 with packet based access network 140 can callanother customer at location Z using TDM endpoint device 123 with PSTNaccess network 121. The BEs 112 and 113 are responsible for thenecessary signaling protocol translation, e.g., SS7 to and from SIP, andmedia format conversion, such as TDM voice format to and from IP basedpacket voice format.

The above VoIP network is described to provide an illustrativeenvironment in which a large quantity of packets may traverse throughoutthe entire network. It would be advantageous to be able to detectanomalous events in the network to monitor performance bottleneck,reliability, security, malicious attacks and the like. In order to so,it would be advantageous to first detect “heavy hitters”. In oneembodiment, the present multi-dimensional hierarchical heavy hitterdetection method as discussed below can be implemented in an applicationserver of the VoIP network.

In order to clearly illustrate the present invention, the followingpacket network related concepts will first be described. These conceptsare that of:

a. A Heavy Hitter (HH);

b. A Hierarchical Heavy Hitter (HHH);

c. A multidimensional hierarchical heavy hitter;

d. A child node

e. A fringe node; and

f. An internal node.

A Heavy Hitter (HH) is an entity that accounts for at least a specifiedproportion of the total activity measured in terms of number of packets,bytes, connections etc. A heavy hitter could correspond to an individualflow or connection. It could also be an aggregation of multipleflows/connections that share some common property, but which themselvesmay not be heavy hitters.

Of particular interest to packet network application is the notion ofhierarchical aggregation. IP addresses can be organized into a hierarchyaccording to prefix. The challenge for hierarchical aggregation is toefficiently compute the total activity of all traffic matching relevantprefixes.

A hierarchical heavy hitter is a hierarchical aggregate that accountsfor some specified proportion of the total activity.

Aggregations can be defined on one or more dimensions, e.g., source IPaddress, destination IP address, source port, destination port, andprotocol fields for IP flows.

Multidimensional Heavy Hitters are multidimensional sets of hierarchicalaggregates that account for some specified proportion of the totalactivity.

In one embodiment, the invention is illustrated with a data networkstructure used to identify address prefixes in IP network. Each node isassociated with a prefix. A child of a node shares the prefix of theparent node but has an additional bit specified. (i.e., if the parent'sprefix is p*, the child's prefix is either p0* or p1*). Generally, thebit “0” is associated with the child created first and the path from theparent node points towards the left. Bit “1” is associated with thechild created second and the path from the parent node points to theright.

Fringe nodes are nodes with no descendant. Internal nodes have 1 or 2descendant nodes (one child associated with bit 0 and one childassociated with bit 1).

The Internet has emerged as a critical communication infrastructure,carrying traffic for a wide range of important scientific, business andconsumer applications. Network service providers and enterprise networkoperators need the ability to detect anomalous events in the network,for network management and monitoring, reliability, security andperformance reasons. While some traffic anomalies are relatively benignand tolerable, others can be symptomatic of potentially serious problemssuch as performance bottlenecks due to flash crowds, network elementfailures, malicious activities such as denial of service attacks (DoS),and worm propagation. It is therefore very important to be able todetect traffic anomalies accurately and in near real-time, to enabletimely initiation of appropriate mitigation steps.

The major challenges for detection of anomalies are the volume oftraffic and the complicated structures of the traffic. This inventionprovides a method for identifying multidimensional Hierarchical HeavyHitters. The candidate traffic is then further analyzed for abnormalbehavior.

In order to assist the reader, the invention will first provide thedefinition of multidimensional hierarchical heavy hitters and introducethe heavy hitter detection problem.

Adopt the Cash Register Model to describe the streaming data. Let,I=α₁,α₂,α₃, . . . , be an input stream of items that arrivessequentially. Each item α_(i)=(k_(i),u_(i)) consists of a key k_(i), anda positive update u_(i)

Associated with each key k is a time varying signal A[k]. The arrival ofeach new data item (k_(i),u_(i)) causes the underlying signal A[k_(i)]to be updated: A[k_(i)]+=u_(i).

DEFINITION 1 (HEAVY HITTER)—Given an input stream I={(k_(i),u_(i))} withtotal sum

${SUM} = {\sum\limits_{i}u_{i}}$

and a threshold φ(0≦φ≦1), a Heavy Hitter (HH) is a key k whoseassociated total value in I is no smaller than φSUM. More precisely, let

$v_{k} = {\sum\limits_{{i:k_{i}} = k}u_{i}}$

denote the total value associated with each key k in I. The set of HeavyHitters is defined as {k|v_(k)≧φSUM}.

The heavy hitter problem is the problem of finding all heavy hitters,and their associated values, in a data stream. For instance, if thedestination IP address is the key, and the byte count is the value, thenthe corresponding HH problem is finding all the destination IP addressesthat account for at least a proportion φ of the total traffic.

DEFINITION 2 (HIERARCHICAL HEAVY HITTER)—Let I={(k_(i),u_(i))} be aninput stream whose keys k_(i) are drawn from a hierarchical domain D ofheight h. For any prefix p of the domain hierarchy, let elem(D,p) be theset of elements in D that are descendents of p. Let

${V( {D,p} )} = {{\sum\limits_{k}{v_{k}\text{:}\mspace{14mu} k}} \in {{elem}( {D,p} )}}$

denote the total value associated with any given prefix p. The set ofHierarchical Heavy Hitters (HHH) is defined as {p|V(D,p)≧φSUM}.

The hierarchical heavy hitter problem is defined as the problem offinding all hierarchical heavy hitters, and their associated values, ina data stream. If the destination IP address is used to define thehierarchical domain, then the corresponding HHH problem is defined asthe problem of not only finding the destination IP addresses but alsoidentifying all the destination prefixes that account for at least aproportion φ of the total traffic.

The invention provides a method for finding all the HH prefixes,including the descendents of p. The method can be adapted and used formore strict definition of HHH. In one embodiment, the invention uses asimpler definition to perform change detection on HHHs and avoidsmissing big changes buried inside the prefixes that would not be trackedunder the more strict definition.

DEFINITION 3 (Multidimensional HIERARCHICAL HEAVY HITTER)—Let D=D₁× . .. ×D_(n) be the Cartesian product of n hierarchical domains D_(j) ofheight h_(j) (j=1,2, . . . n). For any p=(p₁,p₂ . . . p_(n))εD, letelem(D,p)=elem (D₁,p₁)× . . . × elem(D_(n),p_(n)). Given an input streamI={(k_(i),u_(i))}, where k_(i) is drawn from D, let

${V( {D,p} )} = {{\sum\limits_{k}{v_{k}\text{:}\mspace{14mu} k}} \in {{{elem}( {D,p} )}.}}$

The set of Multidimensional hierarchical Heavy Hitters is defined as{p|V(D,p

≧SUM}.

The multidimensional hierarchical heavy hitter problem is defined as theproblem of finding all multidimensional hierarchical heavy hitters, andtheir associated values, in a data stream. As an example, define D basedon source and destination IP addresses. The corresponding 2-dimensionalHHH problem is to find all those source-destination prefix combinations

p₁,p₂

that account for at least a proportion φ of the total traffic.

Once the multidimensional hierarchical heavy hitters have been detectedin each time interval, the present invention then need to track theirvalues across time to detect significant changes, which may indicatepotential anomalies. The present invention refers to this as the changedetection problem.

Our goal in this paper is to develop efficient and accurate streamingalgorithms for detecting multidimensional hierarchical heavy hitters andsignificant changes in massive data streams that are typical of today'sIP traffic.

Once the hierarchical heavy hitters have been detected in each timeinterval, their values are tracked across time to detect significantchanges, which may indicate potential anomalies. This is referred to asthe change detection problem.

The present invention discloses efficient and accurate streaming methodsfor detecting hierarchical heavy hitters and significant changes inmassive data streams that are typical of today's IP traffic. This isaccomplished by identifying all possible keys that have a volumeassociated with them that is greater than the heavy-hitter detectionthreshold at the end of the time interval. In the context of networktraffic a key can be made up of fields in the packet header and it maybe associated with very large ranges. For example in the case of IPprefixes the range is: [0,2³²). Also the key may be a combination of oneor more fields, which can result in significant increase in thecomplexity of the problem. Clearly monitoring all possible keys in theentire range can be prohibitive.

The invention provides a method that builds an adaptive data structure.The data structure dynamically adjusts the granularity of the monitoringprocess to ensure that the particular keys that are heavy-hitters (ormore likely to be heavy-hitters) are correctly identified withoutwasting a lot of resources (in terms of time and space) for keys thatare not heavy-hitters. The data structure resembles a decision tree thatdynamically drills down and starts monitoring a node (that is associatedwith a key) closely only when its direct ancestor becomes sufficientlylarge.

The invention uses two key parameters: φ and ε. Given the total sum SUM,φSUM is the threshold for a cluster to qualify as a heavy hitter;

To guide the building process of the summary data structure, a thresholdis used. The threshold will be referred to as the split threshold,T_(split)·T_(split) is used to make local decisions at each step anddetermine when the range of keys under consideration should be looked atin a finer grain. It is chosen to ensure that the maximum amount oftraffic that can be missed during the dynamic drill-down is at most εSUMfor any cluster. The actual choice of T_(split) depends on the method.The invention provides a method that specifies T_(split) in terms of theactual total sum in a given interval. In one embodiment, the inventionassumes that SUM is a pre-specified constant.

To exemplify the teachings of the invention, let the source anddestination IP addresses be the dimensions for multidimensional HHHdetection and let the metric to be used for detecting the heavy-hittersbe the volume of traffic (e.g. number of bytes) associated with a givenkey. Note that the metric as well as the fields to be considered for thedimensions may be changed based on the application requirements.

In traditional anomaly detection methods, given an n-dimensionalhierarchical network, such as illustrated in FIG. 2, a scheme is used totransform the multidimensional HHH detection problem to essentiallymultiple non-hierarchical HH detection problems, one for each distinctcombination of prefix length values across all the dimensions of theoriginal key space.

For a n-dimensional key space with a hierarchy of height h_(i) in thei-th dimension, there are

$\prod\limits_{i = 1}^{n}( {h_{i} + 1} )$

non-hierarchical HH detection problems, which have to be solved intandem. Such a brute force approach needs to update the data structurefor all possible combinations of prefix lengths and requires extensiveresources. So, the per item update time is proportional to

$\prod\limits_{i = 1}^{n}{( {h_{i} + 1} ).}$

Two variants of the brute force approach that differ from each otheronly in the method used to detect the HHs are provided for illustrativeand comparative purposes. The results of the two brute force methods arereferred to as Baseline Variant 1 and Baseline Variant 2 as describedbelow:

-   -   Baseline variant 1: Sketch-based solution, (sk), which uses        sketch-based probabilistic HH detection. Count-Min sketch is a        probabilistic summary data structure based on random projections        for a good overview of sketch and specific sketch operations.        Let [m] denote set {0,1 . . . , m−1}. A sketch S consists of an        H×K table of registers: T_(S)[i, j](iε[H], jε[K]). Each row        T_(S)[i,] (iε[H]) is associated with a hash function h_(i) that        maps the original key space to [K]. The data structure can be        view as an array of hash tables. Given a key, the sketch allows        one to reconstruct the value associated with it, with        probabilistic bounds on the reconstruction accuracy. The        achievable accuracy is a function of both the number of hash        functions (H), and the size of hash tables (K). This method uses        a separate sketch data structure per distinct prefix length        combination in all the dimensions.    -   Baseline variant 2: Lossy Counting-based solution (Ic), which        uses a deterministic single-pass, sampling-based HH detection        method called Lossy Counting. Lossy Counting uses two        parameters: ε and φ, where 0≦ε<<φ≦1. At any instant, let N be        the total number of items in the input data stream. Lossy        Counting can correctly identify all heavy-hitter keys whose        frequencies exceed φN. Ic provides lower and upper bounds on the        count associated with a heavy hitter. The gap between the two        bounds is guaranteed to be at most εN. The space overhead for        the method is

${O( {\frac{1}{ɛ}{\log ( {ɛ\; N} )}} )}.$

The Lossy Counting method can be modified to work with byte data insteadof count data. All the complexity and accuracy results still applyexcept that N is replaced by SUM. This adapted version is used by thecurrent invention for evaluation. In the worst-case scenario, theperformance of the Ic is an indicative for the worst-case performance ofany other methods based on Lossy Counting.

Unlike the brute force methods, the current invention utilized anAdaptive Decision Tree (ADT) to identify the source and destinationprefixes (used for IP as the keys) that are responsible for an amount oftraffic that exceeds a given threshold. The invention provides a methodto identify the prefixes associated with the multidimensional heavyhitters while maintaining minimal state data and performing a minimumnumber of update operations for each arriving flow of traffic or packet.

The hierarchical nature of the problem is similar to the classical IPlookup problem in which for every received packet the IP destinationfield in the packet header is used to search for a longest matchingprefix in a set of given IP prefixes (also known as a routing table).The difference between the current problem and the IP lookup problem isthat in the IP lookup problem case the set of prefixes is given as aninput and is often static. In contrast, the current method needs togenerate the set of prefixes that are associated with themultidimensional heavy hitters dynamically, based on the packet arrivalpattern.

The multidimensional nature is also similar to packet classificationproblems. Source and destination IP addresses, port addresses andprotocols are typical dimensions for packet applications.Cross-producing techniques are typically used to deal with themultidimensional nature of problems for packet classification.

The current invention utilizes an ADT for the dynamic case to adapt themethods that have been used for the static IP lookup problem. It alsoadapts the cross-producing technique for detection of multidimensionalHHH.

In order to illustrate the teachings of the present method, the 1-d HHHdetection problem is first considered.

FIG. 2 illustrates the one-bit trie data structure at the time of apacket arrival. A standard trie data structure starts with a single nodetrie that is associated with the zero-length prefix. Each node in aone-bit trie has at most two child nodes, one associated with bit 0 andthe other with bit 1. The path directed towards the child associatedwith bit 0 is generally directed to the left of the parent node. Thepath directed to the right of the parent node is associated with bit 1.

The trie data structure and the present invention are extendable tom-bits. For an m-bit tries, each node of the trie has 2^(m) children,similar to the idea of the multi-bit tries used for IP lookup problems.However for simplicity the present invention is described using one-bittries.

FIG. 2 illustrates an example of a trie 200 at the arrival of a packet.To illustrate, dotted circles 205 and 215 represent internal nodes.Solid circles 210, 220 and 225 represent the fringe nodes. The links tothe child nodes associated with bit 0 are 210 and 220. The links to thechild nodes associated with bit 1 are 215 and 225. For example, theaddressing for node 225 would start with 11 and the addressing for node220 would start with 10. The volumes for all the nodes are shown insideof the circles.

The present invention maintains a standard trie data structure thatstarts with a node that is associated with a zero-length prefix. Thevolume field associated with that node is incremented with the size ofeach arriving packet. When the value in this field exceeds T_(split),the node is marked as internal and a new child node associated with theprefix 0* or 1* that the incoming packet matches is created. The size ofthe current packet is then used to initialize the volume field in thenewly created child node. The structure develops dynamically with thearrival of each new packet. The implementation also includes somespecial handling when the bottom of the trie is reached (i.e. when allbits in the key are used). In one illustrative example, the updateoperation is illustrated for a trie with T_(split) set to 10.

FIG. 3 shows the trie after an update operation is completed. Toillustrate, the arriving packet has a Destination IP prefix of 100* anda size of 5 bytes. The method first performs a longest matching prefixoperation on the trie and arrives at the node associated with prefix10*. Adding 5 bytes to the volume field of this node would make itsvalue exceed T_(split). Therefore, the method creates a new nodeassociated with prefix 100* (i.e., the child node associated with bit0). The size of the current packet is used to initialize the volumefield of the newly created node. After the update, the fringe node 220in FIG. 2 becomes an internal node 112. The new child (fringe) node 310is indicated in FIG. 3.

As illustrated, the invention's trie construction process guaranteesthat the value of the volume field in any internal node to always beless than T_(split). As a result, T_(split) is set such thatT_(split)=εSUM/W and the maximum amount of traffic missed as the methoddynamically drills down to the fringe is set to be at most εSUM.

The time complexity of the operations described above is on the sameorder of magnitude as a regular IP lookup operation. For every packetarrival, at most one node in the trie is updated. At most one new nodeis created during each update as long as the volume for the new item isbelow T_(split) (in case the volume exceeds T_(split), an entire newbranch all the way to the maximum depth W is created). At each depth,there can be no more than SUM/T_(split)=W/ε internal nodes (otherwisethe total sum over all the subtries rooted at those nodes would exceedSUM, which is impossible). So the worst-case memory requirement of thedata structure is O(W²/ε).

As illustrated in FIGS. 2 and 3, every packet arrival results in at mostone update. The update occurs at the node which is the most specificnode representing the destination IP prefix (of the packet) at the timeof the packet arrival. Therefore the volumes of the internal nodes needto be reconstructed at the end of the time interval. By delaying thereconstruction process to the end of the time interval, thereconstruction cost is amortized across the entire time interval. Tocompute the volumes associated with all the internal nodes, a recursivepost-order traversal of the trie is performed. In each recursive stepthe volume of the current node is computed as being the sum of thevolume represented in the current trie node and its child nodes.

Note that because of utilizing T_(split) to guide the trie constructionprocess, the volumes represented in the internal nodes even afterreconstruction are not entirely accurate. In order to more accuratelyestimate the volume associated with a given node, an estimate of themissed traffic for that node needs to be included. Below three ways ofestimating the missed traffic are considered:

-   -   Copy-all: the missed traffic for a node N is estimated as the        sum of the total traffic seen by the ancestors of node N in the        path from node N to the root of the tree. Note that copy-all is        conservative in that it copies the traffic trapped at a node to        all its descendents. It always gives an upper bound for the        missed traffic. Since the update operation maintains the        invariant that every internal node N has volume below T_(split),        the estimate given by the copy-all rule is further upper bounded        by the product of the depth of the node and T_(split).    -   No-copy: this is the other extreme that optimistically assumes        the amount of missed traffic to be 0.    -   Splitting: the total contribution of missed traffic by a node N        is split among all its children C in proportion to the total        traffic for C. Essentially what this assumes is that the traffic        pattern before and after the creation of a node are very        similar, so missed traffic is predicted by proportionally        splitting the traffic trapped at a node to all its children.

Both the copy-all and the splitting rule can be easily implemented bytraversing the trie in a top-down fashion.

Once the estimate for the missed traffic is available, it is combinedwith the total amount of observed traffic and the resulting sum is usedas an input for the HHH detection. The accuracy will depend on themethod selected.

In one embodiment, the present invention provides a method for handlingthe 2-dimensional HHH problem by adapting the cross-producingtechniques. The high level concept is to execute the 1-dimensionalmethod for each of the dimensions (IP destination, and IP source) and touse the length associated with the longest matching prefix nodes in eachof the dimensions as an index into a data-structure that holds thevolume data for the 2-dimensional HHHs.

In one embodiment, the present invention maintains three datastructures. Two tries are used to keep track of the 1-dimensionalinformation. An array H of hash tables of size W×W is used to keep trackof the 2-dimensional tuples. A tuple

p₁, p₂

comprises of the longest matching prefix in both the dimensions. Thearray is indexed by the lengths of the prefixes p₁ and p₂. For example,in the case of IPv4 prefixes with a 1-bit trie-based solution, W=32.

For every incoming packet the individual 1-dimensional tries areupdated, which return the longest matching prefix in each of thedimensions. This yields two prefixes p₁ and p₂ with lengths l₁ and l₂,respectively. Next the two lengths are used as an index to identify thehash table H[l₁][l₂].

p₁,p₂

is then used as a lookup key in the hash table H[l₁][l₂]. Subsequently,the volume field of the entry associated with the key is incremented.This process is repeated for every arriving packet.

For every packet three update operations are performed, one operation ineach of the two 1-dimensional tries, and one operation in at most one ofthe hash-tables. This results in a very fast method. The memoryrequirement in the worst case is

O((W²/ɛ)²) = O(W⁴/ɛ²),

due to the use of cross-producing. But in practice, we expect the actualmemory requirement to be much lower.

The next step is to reconstruct the volumes for the 2-d internal nodes.To compute the total volume for the internal nodes, the volume for eachelement in the hash tables is added to all its ancestors. This can beimplemented by scanning all the hash elements twice. During the firstpass, for every entry e represented by key

p₁,p₂

(where p₁ and p₂ represent prefixes) and with prefix lengths

l₁,l₂

the method adds the volume associated with e to its left parent in thehash-map represented by key

ancenstor(p₁),p₂

and lengths

l₁−1,l₂

Note that the process starts from entries with the largest l₁ and endwith entries with the smallest l₁. Then in the second pass, the methodadds the volume to right parent represented by the key

p₁,ancenstor(p₂)

and lengths

l₁,l₂−1

This time the process starts from entries with the largest l₂ and endwith entries with the smallest l₂.

As in the case of the 1-d, the next step is to estimate the missedtraffic for each node. For each key (recall that the key is made up ofthe destination prefix and the source prefix) in the hash table themethod traverses the individual tries to find the prefix represented bythe key and returns the missed traffic estimate obtained from the node(by applying either the copy-all, or the splitting rule as described inSection). The missed traffic is then estimated as the maximum of the twoestimates returned by the two 1-d tries. Using the maximum preserves theconservativeness of copy-all.

The scheme using the Cross-Producing technique is very efficient intime, however it can be potentially memory intensive in the worst case.The present invention overcomes this drawback by adapting two othermethods for two-dimensional packet classification to this problem:Grid-of-Tries and Rectangle Search.

The first method is the Grid-of-Tries and Rectangle Search. Just likeCross-Producing, both Grid-of-Tries and Rectangle Search have beenapplied in the packet classification context. Conceptually, each nodecan be viewed as a rule, then finding nodes on the fringe becomes apacket classification problem.

However most packet classification methods are optimized for arelatively static rule set (through pre-computation), whereas in thecontext of detection of HHH, there may be a need to dynamically maintainthe fringe set. This may involve updating n nodes and possibly creatingn new nodes. Despite the clear difference, both the Grid-of-Tries andRectangle Search methods are adapted to solve the current problem. Anillustration is provided here only to show the basic idea and highlightthe main difference.

FIGS. 4 and 5 illustrate an exemplary grid-of-tries data structure. Thegrid-of-tries data structure contains two levels of tries. The firstlevel is associated with the IP destination prefixes in the classifier(a predefined rule set) while the second level tries are associated withIP source prefixes in the classifier.

For every valid prefix (P₁) node in the first level trie there is apointer to a second level trie. The second level trie is created usingall the prefixes (P₂) for which there is a rule P₁, P₂ in theclassifier. As in the 1-dimensional HHH detection case, thegrid-of-tries data structure is dynamically built based on the packetarrival pattern.

In order to constructing the grid-of-tries for 2-d HHH detection, eachnode in the data structure contains a pointer to each of its children.In addition each node in the first-level trie maintains a pointer to asecond-level trie and each node in the second-level trie maintains ajump pointer for fast trie traversal. Note that there is only onefirst-level trie, but multiple second-level tries. Specifically, thereis a second-level trie for each node in the first-level trie. Each nodealso stores a volume field associated with the volume of traffic thatcorresponds to all the packets having a prefix equal with the prefix ofthe node from the moment that the node is created till the moment whennew child nodes are associated with the node.

If the existence of a current grid-of-tries structure is assumed at thegiven moment, new nodes and tries may be appended to the currentgrid-of-tries with the arrival of a new packet. First, a LongestMatching Prefix (LMP) operation is executed in the first-level trie(using the destination prefix). A fringe node is always identified. Thensame as in the case of the 1-dimensional trie method, if the volumeassociated with this node becomes greater than T_(split) then a newchild node is created and associated with this node. As in the 1-dmethod, the size of the current packet is used to initialize the volumefield for the newly created child node. In addition to adding childnodes in the first-level trie, in the 2-d method the method must alsoinitialize and associate a new second-level trie with each one of thesenewly created children. These second-level tries when first created areonly initialized with a root node. The size of the current packet isused to increment the volume associated with the second-level trie thatis associated with the new LMP in the first-level trie.

The arrival of a packet may also result in a situation where the noderepresented by the LMP in the second-level trie exceeds T_(split). Inthis case a new child is created and associated with this node in thesecond-level trie in a way similar to the 1-dimensional HHH detectionnode creation process.

Every packet that arrives may contribute to multiple updates in thevolume field of the nodes in the second dimension tries. To illustratethe update process consider the example in FIG. 4, and the arrival of apacket with destination IP prefix 000*, and source IP prefix 111* with asize of 4 bytes. T_(split) is set to 10 for this illustration.

FIG. 4 represents the grid-of-tries data structure at the time of thepacket arrival. Nodes 410, 411, 412 and 413 are in the first level trie.Nodes 420, 421, 422 and 423 are in the second level trie. A second-leveltrie is associated (connected by dotted lines in the figure) with eachnode in the first level trie. The dashed lines represent jump pointers(which are always between nodes with the same source prefix).

The grid-of-trie data structure after the update operation isillustrated in FIG. 5. The nodes to which we add the size of the currentpacket are shown in grey. The dashed lines represent jump pointers(which are always between nodes with the same source prefix). The dashedlines in FIGS. 4 and 5 represent jump pointers.

For the moment ignore the dotted lines in the figure. This arrivingpacket contributes to a modification in the value of the volume field ineach one of the second-dimension tries associated with the LMP node inthe first-dimension and all ancestors of this LMP node. The nodes thatare affected by the update are shown in gray. To walk through theprocess, first an LMP operation was done in the first-level trie usingthe first prefix 000*, and the value of the volume field associated withthis LMP node is incremented. The next step is to follow the pointer tothe second-level trie. Again the method does an LMP operation in thesecond-level trie using the second prefix 111*. The search terminateswith the node for prefix 1*. If add the size of the current packet wasto be added to the volume associated with this node it would increasebeyond T_(split). Therefore new child node is created for this node. Thesize of the current packet is used to initialize the volume associatedwith the new child node for prefix 11* as this new node now representsthe LMP. The method must also update the second level tries associatedwith all the less specific prefixes of 000* namely 00, 0* and *.

In order to provide a fast update operation, each fringe node in thesecond-level trie contains a pre-computed jump pointer. Each fringe nodein a second-level trie T₂ for prefix P₂ originating at prefix P₁ in thefirst-level trie maintains a jump pointer to the same prefix P₂ in asecond-level trie that is associated with the direct ancestor of P₁.Note that the jump pointer discussed here can be maintaineddynamically—whenever a node in the second-level trie associated with P₁is created, a node for the second-level trie associated with the directancestor of P₁ (if not already present) is also created. Utilizing jumppointers keeps the time complexity within O(W) as during the updateprocess the method can avoid having to restart the longest prefixmatching problem at the root of every second-level trie (recall that themethod needs to update every second-level trie associated with allancestors of the longest matching prefix node in the path between thenode and the root of the first-level trie).

To ensure that the method only misses εSUM traffic in the worst case,the method also sets T_(split)=εSUM/(2W). The space requirement isO(W²·(2W)/ε)=O(2W³/ε).

The rectangular search is another method provided by the currentinvention to detect multidimensional HHHs. FIG. 6 illustrates therectangular data structure before an update.

Conceptually, Rectangle Search does exactly the same thing asGrid-of-Tries—updating all the elements on the fringe and expanding itwhenever necessary. The major difference lies in how the method locatesall the elements on the fringe. Grid-of-Tries does so using jumppointers. In the worst case, it requires 3W memory accesses, where W isthe width of the key. Rectangle Search uses hash tables instead andrequires 2W (hashed) memory accesses in the worst case. The fringe nodesare in dark shade, and the internal nodes are in light shade. When a newtuple

k1,k2

(with value v) arrives, the process starts from the bottom left cornerand moves towards the upper right corner. T_(split) is set to 10. So anew element gets created.

The basic data structure for Rectangle Search is a set of hash tablesarranged into a 2-dimensional array. More specifically, for eachdestination prefix length l₁ and source prefix length l₂, there is anassociated hash table H[l₁][l₂]. Initially only H[0][0] contains anelement <*,*> with volume 0.

The update operation for a new tuple

k₁,k₂

(with value v) is illustrated in FIGS. 6, 7 and 8. First consider thecase when v is below T_(split), which is the common case as the totalnumber of elements above T_(split) is limited. The method starts with(l₁,l₂)=(0,W) (the lower left corner in FIG. 8). During each step, themethod checks if tuple

p₁,p₂

belongs to the hash table H[l₁][l₂], where p_(i)=prefix(k_(i),l_(i)). If

p₁,p₂

does not exist in H[l₁][l₂], the method simply decrements l₂ by 1 (i.e.,move upwards in FIG. 8) and continue. Otherwise, the method found anelement e. If e is a fringe node and its volume+v is below T_(split),then simply add v to the volume of e. Otherwise, either e is already aninternal node (when updating some other descendents of e) or shouldbecome one after this update. In either case, a new element is createdwith key

p₁,prefix(k₂,l₂+1)

and value v and it is inserted into H[l₁][l₂+1]. In case l₂=0 and ebecomes a new internal node, then expand the fringe towards the right bycreating an element with the key

prefix(k₁,l₁+1),p₂

and inserting it into H[l₁+1][l₂]. The next step is to increment l₁ by 1and continue (i.e., move towards right in FIG. 8). The method terminateswhenever either l₁>W or l₂<0. Since during each step either the methodincrements l₁ by one or decrements l₂ by one, the method takes at most2W−1 steps to terminate.

When v is above T_(split), the steps are virtually identical, exceptthat for each l₁ the method needs to insert one element with value 0into each hash table H[l₁+1][j](l₂<j<W) and then one element with valuev into hash table H[l₁][W]. In the worst case, this may create (W+1)²new elements. But since the number of elements above T_(split) is smallbelowSUM/T_(split), the amortized cost is quite low.

Just like Grid-of-Tries, Rectangle Search requires O(2W³/ε) space toguarantee an error bound of εSUM.

In all the methods described so far, whenever the method receives anitem

k₁,k₂

with value v above T_(split), it creates state for all its ancestors

p₁,p₂

if they do not already exist. Such express expansion of the fringe hasthe advantage that it leads to less missed traffic for the fringe nodesand thus higher accuracy. However, it also requires a lot of space,especially when T_(split) is very small and there are a large number ofitems with value above it (this can happen, for instance, when themaximum depth of the trie is large). The invention introduces a simpletechnique, lazy expansion, to significantly reduce the spacerequirement.

The basic idea for lazy expansion is very simple. Whenever a large itemwith value v satisfying v/T_(split)ε[k−1,k] is received, it is splitinto k smaller items, each with value v/k<T_(split) and perform kseparate updates. Since each item is below T_(split), it will lead tothe creation of no more than W elements. So long as k<W, the inventionguarantees to reduce space requirement while still achieving the samedeterministic worst-case accuracy guarantee. Meanwhile, the method canmodify the update operation to batch k updates together (by taking intoaccount the multiplicities of the item). This avoids any increase in theupdate cost.

So far all our methods assume a fixed value for SUM. For many onlineapplications, however, it may be desirable to set the threshold as afraction of the actual total traffic volume for the current interval,which is not known until all the traffic is seen. In one embodiment, theinvention has a strategy to first use a small threshold derived based onsome conservative estimate of the total traffic (i.e., a lower bound),and increase the threshold when a large amount of additional traffic isseen. Note that as the threshold is increased, all the nodes that shouldno longer exist under the new threshold are removed. The method refersto this as the “compression” operation.

The invention maintains a lower bound and an upper bound of the actualsum (SUM). Whenever the actual sum exceeds the upper bound, it performsthe compression operation and then doubles the upper bound. Thecompression operation simply walks through the trie in a top down mannerand removes the descendents of all the fringe nodes (according to thenew threshold). The compression methods are more involved in 2-d case,but the high-level idea is applicable. We make the followingobservations:

-   -   In the worst case, compression can double the space requirement.        It also adds some computational overhead. But the number of        compression operations only grows logarithmically with the value        of SUM. In practice, a reasonable prediction of the actual sum        based on past history can be obtained. So typically a very small        number of compressions is needed.    -   Compression can potentially provide a better accuracy bound. In        particular, a node can potentially get created sooner than with        a larger threshold, so the amount of missed traffic can be lower        (but in the worst case, the accuracy guarantee still remains the        same).    -   Compression also makes it possible to aggregate multiple data        summaries (possibly for different data sources or created at        different times or locations). For example, in the 1-d case, to        merge two tries, the method just needs to insert every node in        the second trie into the first trie, update the total sum and        detection threshold, and then perform the compression operation        (using the new detection threshold). Such aggregation capability        can be very useful for applications like detecting distributed        denial-of-service attacks.

In one embodiment, the present invention discloses a 5-d HHH detectionfor network anomaly detection. In this example, Rectangle Search andGrid-of-Tries can be used as building blocks to solve the generaln-dimensional HHH detection problem and always result in a factor of Wimprovement over the brute-force approach. However, this may still betoo slow for some applications.

Fortunately, for many practical applications, the general HHH detectionin all the fields is not needed. In the 5-d HHH context, the inventionneeds to handle 5 fields: (source IP, destination IP, source port,destination port, protocol). For protocol an exact match is typicallyrequired (TCP, UDP, ICMP, others). For source or destination port, themethod can construct some very fat and shallow trees. For instance, itcan use a 3-level tree, with level 0 being * (i.e., don't care), level 1being the application class (Web, chat, news, P2P, etc.), and level 2being the actual port number. In addition, it typically only needs tomatch on one of the port numbers (instead of their combination).Finally, it typically only cares about port numbers for TCP and UDPprotocols. Putting all these together, it often suffices to justconsider a few combinations in the context of network anomaly detection.For each combination, an array of grid-of-tries has to be updated.

In the context of network applications one often needs to deal with tensof millions of network time series and it is infeasible to applystandard techniques on per time series basis. Others have usedSketch-based change detection but it works very well when there is onlya single fixed aggregation level. If it is applied to find changes atall possible aggregation levels, the method must take a brute-forceapproach and run one instance of sketch-based change detection for everypossible aggregation level, which can be prohibitive.

The current invention provides a method to perform scalable changedetection for all possible aggregation levels by using the HHH detectionmethod as a pre-filtering mechanism. The basic idea is to extract allthe HHH traffic clusters using a small HHH threshold φ in our HHHdetection methods, reconstruct time series for each individual HHHtraffic cluster, and then perform change detection for eachreconstructed time series. Intuitively, if a cluster never has muchtraffic, then it is impossible to experience any significant (absolute)changes. The method captures most big changes so long as the HHHthreshold φ is sufficiently small.

A major issue the invention addresses is how to deal with thereconstruction errors introduced by the summary data structure. Thepicture is further complicated by the increasing use of sampling innetwork measurements, which introduces sampling errors to the inputstream. Lack of effective mechanisms to accommodate such errors caneasily lead to false alarms (i.e., detection of spurious changes). Thecurrent change detection method can accommodate both types of errors ina unified framework. It is quite general and can be applied to anylinear forecast model.

Clearly, it is prohibitive to keep state for all possible clusters. Onepossible solution is to start per-cluster monitoring after a clusterbecomes a heavy hitter (with sufficiently small filtering threshold).This approach has been used in a different context for accountingpurposes. This present invention uses an alternative approach, which isto reconstruct values from the data summaries and perform changedetection from such reconstructed time series. One advantage of thisapproach is that the method can potentially perform change detection on(either spatially or temporally) aggregated data summaries. For example,in a distributed environment, it can combine data summaries for trafficat different locations and then perform change detection on theaggregated data. Such capability can be potentially very useful fordetecting anomalies like distributed denial-of-service (DDoS) attacks.It can be difficult for the per-cluster monitoring approach to achievethe same effect as a cluster may be a heavy hitter in some locations butnot the others.

The current invention addresses the following major issues in detectingchanges using the summary structure. The techniques are very general andcan be easily applied to any linear forecast model.

Extracting the time series is the first step for time series analysis.In this context, a cluster may not appear in all summary data structuresall the time.

The summary data structure introduces uncertainty in that the truevolume for a cluster may lie anywhere between the lower bound and theupper bound that are obtained from the data summary. Such uncertaintymay further accumulate during forecasting. The method needs to quantifythe cumulative level of uncertainty in order to accommodate it in thedetection criteria.

Sampling is increasingly used in network measurement and introduceserrors in the input data stream. The present invention provides ananalysis framework that is flexible enough to accommodate such errors.

Below the method provided by the invention is presented in the contextof one specific change detection method: Holt-Winters, which has beensuccessfully applied in the past for anomaly detection. Given a timeseries, the forecast model maintains a separate smoothing baselinecomponent and a linear trend component. Big changes can be detected bylooking for data points that significantly deviate from the forecast.For online change detection, it is common to maintain an exponentiallyweighted moving average.

Given a traffic cluster in an interval, the summary data structureproduces three different values by using different rules to calculatethe amount of missed traffic: a lower bound using the no-copy rule, anupper bound using the copy-all rule, and an estimate using the splittingrule. The splitting rule often gives the most accurate estimate.Therefore, the method uses time series splitting rule as the input forthe Holt-Winters forecast model to obtain estimates for the trueforecast errors and detection thresholds. It also uses the no-copy andcopy-all rules to obtain tight bounds on the true forecast errors.

One issue is the presence of missing clusters. A cluster may not appearin the summary structure for every interval. When this happens, themethod still has to estimate its associated traffic volume, otherwisethere will be a gap in the reconstructed time series. Fortunately, thesummary structure allows it to conveniently obtain such estimates. Forexample, given a 2-d missing cluster with key

p₁,p₂

conceptually all it needs to do is to insert a new element with key

p₁,p₂

and value 0 into the summary data structure, which will result in one ormore newly created fringe nodes. The method can then obtain estimatesfor the first newly created fringe node and use them as thecorresponding estimates for

p₁,p₂

After this, it can then remove all the newly created nodes throughcompression. Note that in the final implementation, the method does notneed to actually create the new fringe nodes and then remove them—itjust need to do a lookup to find the first insertion position.

At the first glance, one might compute the baseline and linear trendcomponents recursively to obtain bounds on the forecast error.Unfortunately, reconstruction errors can accumulate exponentially withthis approach and cause the bounds to be too loose to be useful. In oneembodiment, the present invention obtains tight bounds by directlyrepresenting the baseline and linear trend components as linearcombinations of the true traffic volume and then incorporating thebounds. Thus, the solution ignores the remote past as it has very littleeffect on predicting the future. The method needs to keep the state forthe most recent few intervals for each flow.

In one embodiment, the present invention discloses a method forobtaining bounds on forecast errors. Let the use of superscript^(L) and^(U) on a variable denote the lower and upper bounds for the variable,respectively. For example, X_(i) ^(L) denotes the lower bound for X_(i)Below the present invention shows how to compute E_(i) ^(L) and E_(i)^(U), the bounds for the true forecast errors E_(i).

A naïve solution: At the first glance, it seems rather straightforwardto compute E_(i) ^(L) and E_(i) ^(U)—we can directly recursively computebounds for S_(i)T_(i) and then use them to form bounds for F_(i) andE_(i) More specifically, we have

S _(i) ^(U) =aX ¹⁻¹ ^(U)+(1−a)(S _(i−1) ^(U) +T _(i−1) ^(U))

S _(i) ^(L) =aX _(i−1) ^(L)+(1−a)(S _(i−1) ^(L) +T _(i−1) ^(L))

T _(i) ^(U)=β(S _(i) ^(U) −S _(i−1) ^(L))+(1−β)T _(i−1) ^(U)

T _(i) ^(L)=β(S _(i) ^(L) −S _(i−1) ^(U))+(1−β)T _(i−1) ^(L)

F_(i) ^(U) = S_(i) ^(L) + T_(i) ^(L) F_(i) ^(L) = S_(i) ^(U) + T_(i)^(U) E_(i) ^(U) = X_(i) ^(U) − F_(i) ^(L) E_(i) ^(L) = X_(i) ^(L) −F_(i) ^(U)

Unfortunately, reconstruction errors can accumulate exponentially withthis approach and cause the resulted bounds E_(i) ^(L) and E_(i) ^(U) tobe too loose to be useful. The forecast error bounds produced by thenaïve solution can be shown when X_(i) ^(L)=0 and X_(i) ^(U)=1.

In one embodiment, the present invention can obtain tight bounds bydirectly representing S_(i) and T_(i) as linear combinations ofX_(j)(j≦i) and then incorporating the bounds X_(i) ^(L) and X_(i) ^(U).More specifically, let

$S_{i} = {{\sum\limits_{j = 1}^{i - 1}{{s\lbrack {i,j} \rbrack}X_{i}\mspace{14mu} {and}\mspace{14mu} T_{i}}} = {\sum\limits_{j = 1}^{i - 1}{{t\lbrack {i,j} \rbrack}{X_{j}.}}}}$

We can compute s[i, j] and t[i, j] recursively as follows:

${s\lbrack {i,j} \rbrack} = \begin{matrix}{a} & {j = {i - 1}} \\{\{ {( {1 - a} )( {{s\lbrack {{i - 1},j} \rbrack} + {t\lbrack {{i - 1},j} \rbrack}} )} } & {j < {i - 1}}\end{matrix}$t[i, j] = β(s[i, j] − s[i − 1, j]) + (1 − β)t[i − 1, j]

We can prove by induction that s[i, j]=s[i−1, j−1] and t[i, j]=t[i−1,j−1] for ∀_(j)>2 (proof omitted for the interest of brevity). So when weincrement i we only need to compute s[i, j] and t[i, j] for j≦2. Once wehave s[i, j] and t[i, j] let f[i, j]=s[i, j]+t[i, j] We then compute theforecast error bounds E_(i) ^(L) and E_(i) ^(U) as

$E_{i}^{U} = {X_{i}^{U} - {\sum\limits_{j:{{f{\lbrack{i,j}\rbrack}} > 0}}{{f\lbrack {i,j} \rbrack} \cdot X_{j}^{L}}} - {\sum\limits_{j:{{f{\lbrack{i,j}\rbrack}} < 0}}{{f\lbrack {i,j} \rbrack} \cdot X_{j}^{U}}}}$$E_{i}^{L} = {X_{i}^{L} - {\sum\limits_{j:{{f{\lbrack{i,j}\rbrack}} > 0}}{{f\lbrack {i,j} \rbrack} \cdot X_{j}^{U}}} - {\sum\limits_{j:{{f{\lbrack{i,j}\rbrack}} < 0}}{{f\lbrack {i,j} \rbrack} \cdot X_{j}^{L}}}}$

The present solution yields very tight bounds.

Note that the above solution requires keeping the entire interval series[X_(i) ^(L),X_(i) ^(U)] Our solution is simply to ignore the remotepast. This is reasonable as the use of exponential smoothing means theremote past has very little effect on predicting the future. That is,f[i, j] becomes very small when i−j is sufficiently large. As a result,we only need to keep state for the most recent few intervals for eachflow.

FIG. 9 illustrates a flowchart of a method 900 for detecting amulti-dimensional hierarchical heavy hitter. Method 900 starts in step905 and proceeds to step 910.

In step 910, method 900 sets a threshold, e.g., T_(split). Thisthreshold is used to determine when a node will split.

In step 920, method 900 select a plurality of keys, e.g., associatedwith IP source address, IP destination address, port number and thelike, for a trie data structure, e.g., 200 of FIG. 2.

In step 930, the trie data structure is updated. For example, for eachpacket received, the trie data structure is updated with respect toreturning the longest matching prefix and incrementing the volume of therelevant node. It should be noted that step 930 is repeated for apredefined period of time in accordance with the specific requirement ofa particular implementation. In other words, within a certain definedperiod or time interval, e.g., one minute, five minutes, one hour, andso on, packets are received and the trie data structure is updated foreach received packet.

In step 940, method 900 reconstructs or aggregate volume for each of theinternal nodes. For example, at the end of a time interval, the presentinvention performs a recursive post-order traversal of the triestructure.

In step 950, method 900 estimates the missed traffic corresponding toeach node, since all packets are not captured and analyzed. Variousmethods for estimating missed traffic can be used, e.g., the copy-allmethod, the no-copy method and the splitting method as discussed.

In step 960, method 900 detects the HHHs. For example, since method 400now has the observed traffic and the estimated traffic for a node, itcan now combine the observed and estimated missed traffic, where thecombined traffic can be compared with historical or predicted measure oftotal traffic for that node. Method 900 is then able to determine theHHH(s).

Once the HHHs are detected, method 900 in step 970 can implement anynumber of change detection methods to detect changes or anomalous eventsin the network. The important aspect is then once the network is able todetermine HHHs, the network is better equipped to more accurately andefficiently detect anomalous events. Method 900 may proceed to performother post analysis or functions, e.g., reporting function, and thelike. Method 900 ends in step 980.

FIG. 10 illustrates the flowchart of the current invention. Theinvention provides a method for making a conservative estimate of thetraffic and sets the threshold (1010). The threshold, the keys and thefields for detecting anomalies are selected (1020). For every receivedpacket a determination is made on whether it needs to be split intosmaller packets (1030). The packets that are too large are split (1040).The longest matching prefixes are then determined in each dimension(1050). For the selected method, the method then looks up at the hashtables and updates the volumes (1060). A post order traversal is thenperformed to reconstruct the volumes of the internal nodes (1070). Thevolume of the missed traffic is estimated (1080) and the result is addedto the volume of the observed traffic (1090). The result is then used asan input to the HHH detection module (1092) and also fed back to thethreshold determination module. The output of the HHH determinationmodule is tracked across time to search for anomalies (1094).

FIG. 10 depicts a high level block diagram of a general purpose computersuitable for use in performing the functions described herein. Asdepicted in FIG. 10, the system 1000 comprises a processor element 1002(e.g., a CPU), a memory 1004, e.g., random access memory (RAM) and/orread only memory (ROM), a multidimensional hierarchical heavy hitterdetection module 1005, and various input/output devices 1006 (e.g.,storage devices, including but not limited to, a tape drive, a floppydrive, a hard disk drive or a compact disk drive, a receiver, atransmitter, a speaker, a display, a speech synthesizer, an output port,and a user input device (such as a keyboard, a keypad, a mouse, and thelike)).

It should be noted that the present invention can be implemented insoftware and/or in a combination of software and hardware, e.g., usingapplication specific integrated circuits (ASIC), a general purposecomputer or any other hardware equivalents. In one embodiment, thepresent multidimensional hierarchical heavy hitter detection module orprocess 1005 can be loaded into memory 1004 and executed by processor1002 to implement the functions as discussed above. As such, the presentmultidimensional hierarchical heavy hitter detection method 1005(including associated data structures) of the present invention can bestored on a computer readable medium or carrier, e.g., RAM memory,magnetic or optical drive or diskette and the like.

While various embodiments have been described above, it should beunderstood that they have been presented by way of example only, and notlimitation. Thus, the breadth and scope of a preferred embodiment shouldnot be limited by any of the above-described exemplary embodiments, butshould be defined only in accordance with the following claims and theirequivalents.

1. A method for detecting at least one hierarchical heavy hitter from astream of packets, comprising: receiving at least one packet from saidstream of packets; associating at least two keys with at least twofields of said at least one packet; applying an adaptive trie datastructure, where each node of said adaptive trie data structure isassociated with said at least two keys; and using said adaptive triedata structure to determine said at least one hierarchical heavy hitter.2. The method of claim 1, wherein said stream of packets are receivedfrom a packet network.
 3. The method of claim 2, wherein said packetnetwork is a Voice over Internet Protocol (VoIP) network.
 4. The methodof claim 1, wherein said applying said adaptive trie data structurecomprises: updating said adaptive trie data structure for each receivedpacket.
 5. The method of claim 4, wherein said updating comprises:updating a volume of at least one node in said adaptive trie datastructure.
 6. The method of claim 4, wherein said updating comprises:determining whether an additional node is to be added into said adaptivetrie data structure in accordance with a threshold.
 7. The method ofclaim 1, further comprising: applying said at least one detectedhierarchical heavy hitter to perform change detection.
 8. Acomputer-readable medium having stored thereon a plurality ofinstructions, the plurality of instructions including instructionswhich, when executed by a processor, cause the processor to perform thesteps of a method for detecting at least one hierarchical heavy hitterfrom a stream of packets, comprising: receiving at least one packet fromsaid stream of packets; associating at least two keys with at least twofields of said at least one packet; applying an adaptive trie datastructure, where each node of said adaptive trie data structure isassociated with said at least two keys; and using said adaptive triedata structure to determine said at least one hierarchical heavy hitter.9. The computer-readable medium of claim 8, wherein said stream ofpackets are received from a packet-switch network.
 10. Thecomputer-readable medium of claim 9, wherein said packet-switch networkis a Voice over Internet Protocol (VoIP) network.
 11. Thecomputer-readable medium of claim 8, wherein said applying said adaptivetrie data structure comprises: updating said adaptive trie datastructure for each received packet.
 12. The computer-readable medium ofclaim 11, wherein said updating comprises: updating a volume of at leastone node in said adaptive trie data structure.
 13. The computer-readablemedium of claim 11, wherein said updating comprises: determining whetheran additional node is to be added into said adaptive trie data structurein accordance with a threshold.
 14. The computer-readable medium ofclaim 8, further comprising: applying said at least one detectedhierarchical heavy hitter to perform change detection.
 15. An apparatusfor detecting at least one hierarchical heavy hitter from a stream ofpackets, comprising: means for receiving at least one packet from saidstream of packets; means for associating at least two keys with at leasttwo fields of said at least one packet; means for applying an adaptivetrie data structure, where each node of said adaptive trie datastructure is associated with said at least two keys; and means for usingsaid adaptive trie data structure to determine said at least onehierarchical heavy hitter.
 16. The apparatus of claim 15, wherein saidstream of packets are received from a packet network.
 17. The apparatusof claim 15, wherein said means for applying said adaptive trie datastructure comprises: means for updating said adaptive trie datastructure for each received packet.
 18. The apparatus of claim 17,wherein said means for updating comprises: means for updating a volumeof at least one node in said adaptive trie data structure.
 19. Theapparatus of claim 17, wherein said means for updating comprises: meansfor determining whether an additional node is to be added into saidadaptive trie data structure in accordance with a threshold.
 20. Theapparatus of claim 15, further comprising: means for applying said atleast one detected hierarchical heavy hitter to perform changedetection.